智能论文笔记

VGStore: A Multimodal Extension to SPARQL for Querying RDF Scene Graph

Yanzeng Li , Zilong Zheng , Wenjuan Han , Lei Zou

分类：自然语言处理

2022-09-07

语义Web技术已成功促进了许多具有丰富数据表示方法的RDF模型。它还具有代表和存储多模式知识库（例如多模式场景图）的潜在能力。但是，大多数现有的查询语言，尤其是SPARQL，几乎没有探索隐式多模式关系，例如语义相似性，空间关系等。我们首先通过在RDF图数据库中组织一个大型场景图（即视觉基因组）来探索此问题。基于建议的RDF存储的多模式场景图，我们扩展了SPARQL查询，以回答包含有关颜色，空间等关系推理的问题。进一步的演示（即VGStore）显示了定制查询和显示多模式数据的有效性。

translated by 谷歌翻译

gBuilder: A Scalable Knowledge Graph Construction System for Unstructured Corpus

Yanzeng Li , Lei Zou

分类：自然语言处理

2022-08-20

我们设计了一个用户友好且可扩展的知识图构建（KGC）系统，用于从非结构化语料库中提取结构化知识。与现有的KGC系统不同，Gbuilder提供了一种灵活且用户定义的管道，可以包含IE模型的快速开发。可以使用更多基于内置的模板或启发式操作员和可编程操作员来适应来自不同域的数据。此外，我们还为Gbuilder设计了基于云的自适应任务计划，以确保其在大规模知识图构造上的可扩展性。实验评估不仅证明了Gbuilder在统一平台中组织多个信息提取模型的能力，还证实了其在大规模KGC任务上的高可扩展性。

translated by 谷歌翻译

REZCR: A Zero-shot Character Recognition Method via Radical Extraction

Xiaolei Diao , Daqian Shi , Hao Tang , Lei Wu , Yanzeng Li , Hao Xu

分类：计算机视觉

2022-07-12

长尾效应是一个常见的问题，它限制了对现实世界数据集中深度学习模型的性能。由于字符使用频率差异，角色图像数据集的开发还受到这种不平衡数据分布的影响。因此，当当前的角色识别方法应用于现实世界数据集时，尤其是尾巴中缺少训练样本的字符类别，例如不常见的字符或历史文档中的字符。在本文中，我们通过自由基提取（即REZCR）提出一个零摄像的角色识别框架，以提高几个样本字符类别的识别性能，在其中我们通过分解和分解和分解和分解和分解和分解字符的图形单位来利用有关的信息重建拼字法之后的字符。 REZCR由基于注意力的激进信息提取器（RIE）和基于知识图的角色推理器（KGR）组成。 RIE的目的是认识到候选激进分子及其从角色图像中可能的结构关系。结果将被馈入KGR，以通过使用预设计的字符知识图来识别目标字符。我们在多个数据集上验证我们的方法，REZCR显示出有希望的实验结果，尤其是对于少数样本字符数据集。

translated by 谷歌翻译

Crake: Causal-Enhanced Table-Filler for Question Answering over Large Scale Knowledge Base

Minhao Zhang , Ruoyu Zhang , Yanzeng Li , Lei Zou

分类：自然语言处理

2022-07-08

语义解析通过组成KB查询来求解知识库（KB）问题回答（KBQA），该查询通常涉及节点提取（NE）和图形组成（GC）以检测和连接查询中相关的节点。尽管NE和GC之间具有强烈的因果影响，但先前的作品未能直接建模其管道中的这种因果关系，从而阻碍了学习子任务相关性的学习。同样，先前作品中GC的序列产生过程会引起歧义和暴露偏见，从而进一步损害准确性。在这项工作中，我们将语义解析正式分为两个阶段。在第一阶段（图结构生成）中，我们提出了一个因果增强的桌面填充者，以克服序列模型的问题并学习内部因果关系。在第二阶段（关系提取）中，提出了一种有效的梁搜索算法，以扩展大规模KB的复杂查询。 LC-Quad 1.0的实验表明，我们的方法超过了先前的最新边距（17％），同时剩余时间和空间效率。代码和型号可在https://github.com/aozmh/crake上找到。

translated by 谷歌翻译

Time-aware Self-Attention Meets Logic Reasoning in Recommender Systems

Zhijian Luo , Zihan Huang , Jiahui Tang , Yueen Hou , Yanzeng Gao

分类：人工智能 | 机器学习

2022-08-29

在大数据时代，推荐系统在我们日常生活中的关键信息过滤表现出了杰出的成功。近年来，推荐系统的技术发展，从感知学习到认知推理，这些认知推理将推荐任务作为逻辑推理的过程，并取得了重大改进。但是，推理中的逻辑陈述隐含地承认有序无关紧要，甚至没有考虑在许多建议任务中起重要作用的时间信息。此外，与时间上下文合并的建议模型往往是自我集中的，即自动更加（少）将相关性（不相关）分别集中在相关性上。为了解决这些问题，在本文中，我们提出了一种基于神经协作推理（TISANCR）的推荐模型的时间感知自我注意力，该模型将时间模式和自我注意机制集成到基于推理的建议中。特别是，以相对时间为代表的时间模式，提供上下文和辅助信息来表征用户在建议方面的偏好，而自我注意力则是利用自我注意力来提炼信息的模式并抑制无关紧要的。因此，自我煽动的时间信息的融合提供了对用户偏好的更深入表示。基准数据集的广泛实验表明，所提出的Tisancr取得了重大改进，并始终优于最先进的建议方法。

translated by 谷歌翻译

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

Junjie Yan , Yingfei Liu , Jianjian Sun , Fan Jia , Shuailin Li , Tiancai Wang , Xiangyu Zhang

分类：计算机视觉

2023-01-03

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.

translated by 谷歌翻译

Backdoor Attacks Against Dataset Distillation

Yugeng Liu , Zheng Li , Michael Backes , Yun Shen , Yang Zhang

分类：机器学习

2023-01-03

Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.

translated by 谷歌翻译

Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Li Zhang , Chris Callison-Burch

分类：自然语言处理

2023-01-03

Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

RELIANT: Fair Knowledge Distillation for Graph Neural Networks

Yushun Dong , Binchi Zhang , Yiling Yuan , Na Zou , Qi Wang , Jundong Li

分类：机器学习

2023-01-03

Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.

translated by 谷歌翻译